Data Preprocessing

  1. First, I wanted to investigate which variables are time dependent and also exclude some that were clearly unnecessary (i.e., “SITE”,“COLPROT”,“ORIGPROT”, “FLDSTRENG”,“FSVERSION”,“IMAGEUID”, “Month_bl”,“Month”,“M”,“update_stamp”).

  2. Merge time dependent and independent variables into the long_dat data frame. Also, I recoded the time points in the VISCODE variable into integers.

long_dat <- dat[, c(ivars[,1], nivars[,1])] %>%
  mutate(VISCODE = match(VISCODE, c("bl", "m03", "m06", "m12", "m18", "m24", 
                                    "m30","m36", "m42", "m48", "m54", "m60", 
                                    "m66", "m72","m78", "m84", "m90", "m96", 
                                    "m102", "m108","m114", "m120", "m126", 
                                    "m132", "m144", "m156"))-1) %>%
  relocate(RID, PTID, VISCODE) %>%
  arrange(RID, VISCODE)
  1. In the original data frame there were quite some _bl or _BL variables. Thus, I wanted to check whether these columns had already been integrated or not at each corresponding time point for each participant. Surprise, the test was negative.

  2. Therefore, I continued with merging the _bl/_BL variables with the corresponding time dependent variable for each participant. Additionally, I specified the data type of each variable individually for optimal control and oversight over the data structure.

  3. Transform Long to Wide Data Format

## # A tibble: 6 × 1,153
##   RID   PTID         AGE PTGENDER PTEDUCAT PTETHCAT PTRACCAT PTMARRY APOE4 FDG_0
##   <fct> <chr>      <dbl> <fct>       <int> <fct>    <fct>    <fct>   <int> <dbl>
## 1 2     011_S_0002  74.3 Male           16 Not His… White    Married     0  1.37
## 2 3     011_S_0003  81.3 Male           18 Not His… White    Married     1  1.08
## 3 4     022_S_0004  67.5 Male           10 Hisp/La… White    Married     0 NA   
## 4 5     011_S_0005  73.7 Male           16 Not His… White    Married     0  1.29
## 5 6     100_S_0006  80.4 Female         13 Not His… White    Married     0 NA   
## 6 7     022_S_0007  75.4 Male           10 Hisp/La… More th… Married     1 NA   
## # ℹ 1,143 more variables: FDG_2 <dbl>, FDG_7 <dbl>, FDG_11 <dbl>, FDG_12 <dbl>,
## #   FDG_13 <dbl>, FDG_14 <dbl>, FDG_15 <dbl>, FDG_16 <dbl>, FDG_17 <dbl>,
## #   FDG_18 <dbl>, FDG_19 <dbl>, FDG_21 <dbl>, FDG_22 <dbl>, FDG_23 <dbl>,
## #   FDG_24 <dbl>, FDG_3 <dbl>, FDG_4 <dbl>, FDG_5 <dbl>, FDG_6 <dbl>,
## #   FDG_9 <dbl>, FDG_8 <dbl>, FDG_10 <dbl>, FDG_25 <dbl>, FDG_20 <dbl>,
## #   FDG_1 <dbl>, PIB_0 <dbl>, PIB_2 <dbl>, PIB_7 <dbl>, PIB_11 <dbl>,
## #   PIB_12 <dbl>, PIB_13 <dbl>, PIB_14 <dbl>, PIB_15 <dbl>, PIB_16 <dbl>, …

Attrition Analysis

Based on the number of participants measured at any time point I made a frequency plot to get a first idea of the sampling frequency.

Domains

Demographics

Cognitive Tests

Biomedical Imaging

Biomarkers

Based on these findings it appears that time point 9 is a cut-off where the number of measurements drop quite strongly. Time point 9 corresponds to month 42 (i.e., 3.5 years) of the follow-up.

Polygenic Risk Score for Educational attainment

The merge(by.x, by.y) function creates a new data frame that only keeps those rows for which there is a matching key (in our case PTID). Therefore, we do have genetic data from 2 additional individuals for which we do not have any other measurements. The final data frame for which testing data and genetic data is available is thus, 1408 (N).

Plot PGS EA vs. Actual EA

Based on this plot, we can see a positive relationship between the polygenic score for education attainment and actual years of education. This means that with a higher PGS score comes higher genetic capacity for educational attainment.

We ran Pearson’s correlation which resulted in r = 0.286 (p-value < 2.2e-16)

Create Residuals

To get the residual we regressed the polygenic risk score for educational attainment against actual EA including the variables SEX & AGE as covariates. The results are depicted in the density plot.

How to interpret the Residuals?

It is important to correctly interpret the residual scores. The correct way to interpret them is, that a high residual score means that the individual has over-performed relative to his or her genetic capacity. See for example in this table for a short proof:

##   Actual Predicted  Residuals
## 1     18  16.91911  1.0808864
## 2     16  15.16815  0.8318481
## 3     12  16.64336 -4.6433625
## 4     20  16.02560  3.9743989
## 5     14  14.83958 -0.8395765
## 6     13  15.37284 -2.3728412

It is interesting to see that the residual plot is not normally distributed. Does this suggest that we should continue using non-parametric analysis techniques?

Survival Analysis

Using the ntile function from dplyr, the lower tertile will be assigned value 1 (~ negative residual), middle tertile value 2 and upper tertile value 3 (~positive residual).

Mini-Mental State Examination (MMSE)

“The mini–mental state examination (MMSE) is a 30-point questionnaire that is used extensively in clinical and research settings to measure cognitive impairment. It is commonly used in medicine and allied health to screen for dementia. It is also used to estimate the severity and progression of cognitive impairment and to follow the course of cognitive changes in an individual over time; thus making it an effective way to document an individual’s response to treatment.Administration of the test takes between 5 and 10 minutes and examines functions including registration (repeating named prompts), attention and calculation, recall, language, ability to follow simple commands and orientation. […] Any score of 24 or more (out of 30) indicates a normal cognition. Below this, scores can indicate severe (≤9 points), moderate (10–18 points) or mild (19–23 points) cognitive impairment.” (Wikipedia.org)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), MMSE_cut) ~ thirtile, 
##     data = .)
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 2800      507      399      29.5      62.8
## thirtile=3 2745      284      392      30.0      62.8
## 
##  Chisq= 62.8  on 1 degrees of freedom, p= 2e-15
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Alzheimer’s Disease Assessment Scale

ADAS11

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), ADAS11_cut) ~ thirtile, 
##     data = .)
## 
## n=5540, 5 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 2799      217      169      13.4      27.7
## thirtile=3 2741      119      167      13.6      27.7
## 
##  Chisq= 27.7  on 1 degrees of freedom, p= 1e-07

ADAS13

“The ADAS13 was included as a global measure of cognitive function. ADAS13 is a test battery developed to assess severity of cognitive impairment associated with AD and includes subtests and clinical evaluations assessing memory function, reasoning, language function, orientation and praxis. The ADAS13 is a modified version of the original ADAS-Cog-11, adding a cancellation task and a delayed free recall task. The higher the scores, the more severe impairment of cognitive function.” (Mofrad et al., 2021)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), ADAS13_cut) ~ thirtile, 
##     data = .)
## 
## n=7290, 27 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 3649      255      206      11.9      24.1
## thirtile=3 3641      157      206      11.8      24.1
## 
##  Chisq= 24.1  on 1 degrees of freedom, p= 9e-07

ADASQ4

## Warning in pchisq(chi, df, lower.tail = FALSE): NaNs produced
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), ADASQ4_cut) ~ thirtile, 
##     data = .)
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 3659        0        0       NaN       NaN
## thirtile=3 3658        0        0       NaN       NaN
## Warning in pchisq(x$chisq, df, lower.tail = FALSE): NaNs produced
## 
##  Chisq= 0  on -1 degrees of freedom, p= NA

CDRSB

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), CDRSB_cut) ~ thirtile, 
##     data = .)
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 3659      433      341      24.6      50.8
## thirtile=3 3658      252      344      24.5      50.8
## 
##  Chisq= 50.8  on 1 degrees of freedom, p= 1e-12

DIGITSCORE

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), DIGITSCOR_cut) ~ 
##     thirtile, data = .)
## 
## n=4029, 3288 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 2146       31     61.4      15.1      31.3
## thirtile=3 1883       88     57.6      16.1      31.3
## 
##  Chisq= 31.3  on 1 degrees of freedom, p= 2e-08

ECOGPTDIVAT

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtDivatt_cut) ~ 
##     thirtile, data = .)
## 
## n=3888, 3429 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 1820       93     96.3     0.110     0.211
## thirtile=3 2068      110    106.7     0.099     0.211
## 
##  Chisq= 0.2  on 1 degrees of freedom, p= 0.6

ECOGSPVISPAT

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPVisspat_cut) ~ 
##     thirtile, data = .)
## 
## n=3953, 3364 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 1846      233      205      3.85       7.8
## thirtile=3 2107      191      219      3.60       7.8
## 
##  Chisq= 7.8  on 1 degrees of freedom, p= 0.005

FAQ

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), FAQ_cut) ~ thirtile, 
##     data = .)
## 
## n=7308, 9 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 3659      425      364      10.3      21.3
## thirtile=3 3649      304      365      10.3      21.3
## 
##  Chisq= 21.3  on 1 degrees of freedom, p= 4e-06

LDELTOTAL

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), LDELTOTAL_cut) ~ 
##     thirtile, data = .)
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 3659      120      223      47.7      96.7
## thirtile=3 3658      329      226      47.2      96.7
## 
##  Chisq= 96.7  on 1 degrees of freedom, p= <2e-16

MOCA

Reference literature: doi: 10.1111/j.1532-5415.2005.53221.x

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), MOCA_cut) ~ thirtile, 
##     data = .)
## 
## n=3896, 3421 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 1819      328      418      19.4      39.2
## thirtile=3 2077      545      455      17.8      39.2
## 
##  Chisq= 39.2  on 1 degrees of freedom, p= 4e-10

Rey-Auditory Verbal Learning Test (RAVLT)

The RAVLT was included as a measure of memory function. In this test, the participants are asked to recall words from a list of 15 nouns immediately after each of five learning trials and after a short and a long delay. Two measures known to be sensitive to cognitive changes in patients with AD were included in the present study: Immediate recall (RAVLT-Im): the number of correct responses across the immediate recall of the five learning trials; percent forgetting (RAVLT-PF): the score on the fifth learning trial minus the score on the long delayed recall, divided by the score obtained on the fifth learning trial. The lower the scores, the more severe impairment of cognitive function.

RAVLT Forgetting

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), RAVLT_forgetting_cut) ~ 
##     thirtile, data = .)
## 
## n=7299, 18 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 3651       68     83.5      2.89      5.77
## thirtile=3 3648      100     84.5      2.85      5.77
## 
##  Chisq= 5.8  on 1 degrees of freedom, p= 0.02

RAVLT Immediate

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), RAVLT_immediate_cut) ~ 
##     thirtile, data = .)
## 
## n=7299, 18 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 3651       47      142      63.3       127
## thirtile=3 3648      238      143      62.6       127
## 
##  Chisq= 127  on 1 degrees of freedom, p= <2e-16

RAVLT Learning

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), RAVLT_learning_cut) ~ 
##     thirtile, data = .)
## 
## n=7299, 18 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 3651       95      126      7.47      14.9
## thirtile=3 3648      158      127      7.37      14.9
## 
##  Chisq= 14.9  on 1 degrees of freedom, p= 1e-04

RAVLT Percentage Forgetting

## Warning in pchisq(chi, df, lower.tail = FALSE): NaNs produced
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), RAVLT_perc_forgetting_cut) ~ 
##     thirtile, data = .)
## 
## n=7290, 27 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 3642        0        0       NaN       NaN
## thirtile=3 3648        0        0       NaN       NaN
## Warning in pchisq(x$chisq, df, lower.tail = FALSE): NaNs produced
## 
##  Chisq= 0  on -1 degrees of freedom, p= NA

TRABSCORE

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), TRABSCOR_cut) ~ 
##     thirtile, data = .)
## 
## n=7252, 65 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 3619      503      386      35.8      72.5
## thirtile=3 3633      274      391      35.2      72.5
## 
##  Chisq= 72.5  on 1 degrees of freedom, p= <2e-16

Patient’s Everyday Cognition (EcogPtLang)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtLang_cut) ~ 
##     thirtile, data = .)
## 
## n=3919, 3398 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 1830      133      111      4.56      8.78
## thirtile=3 2089      100      122      4.11      8.78
## 
##  Chisq= 8.8  on 1 degrees of freedom, p= 0.003

Patient’s Everyday Cognition (EcogPtMem)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtMem_cut) ~ 
##     thirtile, data = .)
## 
## n=3925, 3392 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 1828       59     65.4     0.618      1.18
## thirtile=3 2097       79     72.6     0.556      1.18
## 
##  Chisq= 1.2  on 1 degrees of freedom, p= 0.3

Patient’s Everyday Cognition (EcogPtOrgan)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtOrgan_cut) ~ 
##     thirtile, data = .)
## 
## n=3855, 3462 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 1787      129      125     0.119     0.228
## thirtile=3 2068      136      140     0.106     0.228
## 
##  Chisq= 0.2  on 1 degrees of freedom, p= 0.6

Patient’s Everyday Cognition (EcogPtPlan)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtPlan_cut) ~ 
##     thirtile, data = .)
## 
## n=3915, 3402 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 1828      133      104      7.84      15.2
## thirtile=3 2087       86      115      7.14      15.2
## 
##  Chisq= 15.2  on 1 degrees of freedom, p= 1e-04

Patient’s Everyday Cognition (EcogPtTotal)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtTotal_cut) ~ 
##     thirtile, data = .)
## 
## n=3919, 3398 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 1830      121      107      1.95      3.76
## thirtile=3 2089      103      117      1.77      3.76
## 
##  Chisq= 3.8  on 1 degrees of freedom, p= 0.05

Patient’s Everyday Cognition (EcogPtVisspat)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtVisspat_cut) ~ 
##     thirtile, data = .)
## 
## n=3897, 3420 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 1827      117      112     0.260     0.504
## thirtile=3 2070      116      121     0.239     0.504
## 
##  Chisq= 0.5  on 1 degrees of freedom, p= 0.5

Self-Reported Everyday Cognitive Abilities Questionnaire (EcogSPDivatt)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPDivatt_cut) ~ 
##     thirtile, data = .)
## 
## n=3913, 3404 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 1843      251      215      6.06      12.3
## thirtile=3 2070      190      226      5.76      12.3
## 
##  Chisq= 12.3  on 1 degrees of freedom, p= 5e-04

Self-Reported Everyday Cognitive Abilities Questionnaire (EcogSPLang)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPLang_cut) ~ 
##     thirtile, data = .)
## 
## n=3989, 3328 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 1872      188      169      2.11      4.29
## thirtile=3 2117      157      176      2.03      4.29
## 
##  Chisq= 4.3  on 1 degrees of freedom, p= 0.04

Self-Reported Everyday Cognitive Abilities Questionnaire (EcogSPMem)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPMem_cut) ~ 
##     thirtile, data = .)
## 
## n=3989, 3328 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 1872      179      150      5.74      11.7
## thirtile=3 2117      125      154      5.57      11.7
## 
##  Chisq= 11.7  on 1 degrees of freedom, p= 6e-04

Self-Reported Everyday Cognitive Abilities Questionnaire (EcogSPOrgan)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPOrgan_cut) ~ 
##     thirtile, data = .)
## 
## n=3850, 3467 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 1791      231      217     0.950      1.93
## thirtile=3 2059      216      230     0.893      1.93
## 
##  Chisq= 1.9  on 1 degrees of freedom, p= 0.2

Self-Reported Everyday Cognitive Abilities Questionnaire (EcogSPPlan)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPPlan_cut) ~ 
##     thirtile, data = .)
## 
## n=3959, 3358 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 1855      267      226      7.44      15.1
## thirtile=3 2104      197      238      7.06      15.1
## 
##  Chisq= 15.1  on 1 degrees of freedom, p= 1e-04

Self-Reported Everyday Cognitive Abilities Questionnaire (EcogSPTotal)

## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPTotal_cut) ~ 
##     thirtile, data = .)
## 
## n=3981, 3336 observations deleted due to missingness.
## 
##               N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile=1 1871      246      215      4.61      9.39
## thirtile=3 2110      192      223      4.42      9.39
## 
##  Chisq= 9.4  on 1 degrees of freedom, p= 0.002